fix(partial-rollout): cap max_new_tokens by prior response length by none0663 · Pull Request #2122 · THUDM/slime

none0663 · 2026-06-23T04:30:06Z

Summary

In partial rollout, an aborted sample is resubmitted with its previously
generated tokens already attached (sample.response_length > 0). However,
sampling_params["max_new_tokens"] was still set to the full
rollout_max_response_len, allowing the engine to generate another full
budget of tokens. As a result, the total response length could exceed
rollout_max_response_len (nothing downstream clamps it).

This subtracts the already-generated length so the cumulative response stays
within rollout_max_response_len. When the budget is already exhausted,
max_new_tokens becomes 0 and the sample is marked TRUNCATED by the
existing guard.

Changes

rollout/sglang_rollout.py (generate)
rollout/sglang_streaming_rollout.py (generate_streaming)

Only affects runs with --partial-rollout and samples that already have
response tokens; fresh samples are unchanged.

none0663 added 2 commits June 23, 2026 12:10

fix partial-rollout: cap max_new_tokens by prior response length

f2fc681

Merge branch 'main' into fix-partial-rollout-rollout-max-response

2888b79

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(partial-rollout): cap max_new_tokens by prior response length#2122

fix(partial-rollout): cap max_new_tokens by prior response length#2122
none0663 wants to merge 2 commits into
THUDM:mainfrom
none0663:fix-partial-rollout-rollout-max-response

none0663 commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

none0663 commented Jun 23, 2026

Summary

Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant